8 research outputs found
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
We describe the CoNLL-2003 shared task: language-independent named entity
recognition. We give background information on the data sets (English and
German) and the evaluation method, present a general overview of the systems
that have taken part in the task and discuss their performance
Combined optimization of feature selection and algorithm parameters in machine learning of language
Comparative machine learning experiments have become an important methodology in empirical approaches to natural language processing (i) to investigate which machine learning algorithms have the 'right bias' to solve specific natural language processing tasks, and (ii) to investigate which sources of information add to accuracy in a learning approach. Using automatic word sense disambiguation as an example task, we show that with the methodology currently used in comparative machine learning experiments, the results may often not be reliable because of the role of and interaction between feature selection and algorithm parameter optimization. We propose genetic algorithms as a practical approach to achieve both higher accuracy within a single approach, and more reliable comparisons
A named entity recognition system for Dutch
We describe a Named Entity Recognition system for Dutch that combines gazetteers, hand-crafted rules, and machine learning on the basis of seed material. We used gazetteers and a corpus to construct training material for Ripper, a rule learner. Instead of using Ripper to train a complete system, we used many different runs of Ripper in order to derive rules which we then interpreted and implemented in our own, hand-crafted system. This speeded up the building of a hand-crafted system, and allowed us to use many different rule sets in order to improve performance. We discuss the advantages of using machine learning software as a toot in knowledge acquisition, and evaluate the resulting system for Dutch
Memory-Based Named Entity Recognition Using Unannotated Data
We used the memory-based learner Timbl (Daelemans et al., 2002) to find names in English and German newspaper text. A first system used only the training data, and a number of gazetteers. The results show that gazetteers are not beneficial in the English case, while they are for the German data. Type-token generalization was applied, but also reduced performance
Diversity Checker: Toward recommendations for improving journalism with respect to diversity
The Diversity Checker is a tool that aims to make it easier for journalists to author their texts with diversity in mind. To provide helpful hints for them in this respect, it is necessary to define how to quantify diversity so that this can be programmed into the tool. At this early stage in the development of the tool, we present a two-fold contribution. First, we offer an analysis on what we mean by "improving diversity". Second, we present the first version of the Diversity Checker, along with some analysis of its current performance.status: Published onlin